The Effect of the Used Resampling Technique and Number of Samples in Consolidated Trees’ Construction Algorithm
نویسندگان
چکیده
In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated Trees ́ Construction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide analysis of the behavior of CTC algorithm for 20 databases. The effect of two parameters of the algorithm: number of samples and the way subsamples have been built has been analyzed. The results obtained with Consolidated Trees have been compared to C4.5 trees executing 5 times a 10 fold cross validation. The comparison has been done from two points of view: error rate (accuracy) and complexity (explanation). Results show that, for subsamples of 75% of the training sample, Consolidated Trees achieve, in average, smaller error rates than C4.5 trees when they are built with 10 or more subsamples and with similar complexity, so, they are better situated in the learning curve. On the other hand, the method used to build subsamples clearly affects to the quality of results achieved with Consolidated Trees. If bootstrap samples are used to build trees the obtained results are worse than the ones obtained with subsamples of 75% from the two points of view: error and complexity.
منابع مشابه
Consolidated Trees: An Analysis of Structural Convergence
When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary...
متن کاملConsolidated Technique of Response Surface Methodology and Data Envelopment Analysis for setting the parameters of meta-heuristic algorithms - Case study: Production Scheduling Problem
In this study, given the sequence dependent setup times, we attempt using the technique of Response Surface Methodology (RSM) to set the parameters of the genetic algorithm (GA), which is used to optimize the scheduling problem of n job on 1 machine (n/1). It aims at finding the most suitable parameters for increasing the efficiency of the proposed algorithm. At first, a central composite d...
متن کاملAn Efficient Target Tracking Algorithm Based on Particle Filter and Genetic Algorithm
In this paper, we propose an efficient hybrid Particle Filter (PF) algorithm for video tracking by employing a genetic algorithm to solve the sample impoverishment problem. In the presented method, the object to be tracked is selected by a rectangular window inside which a few numbers of particles are scattered. The particles’ weights are calculated based on the similarity between feature vecto...
متن کاملUsing Simulated Annealing (SA), Evolutionary Algorithm To Determine Optimal Dimensions of Clay Core in Earth Dams
Earth dam is a structure as homogeneous or non-homogeneous forms for raising water level or water supply. Earth dam consist of different parts that one of the main parts is clay core. Choosing an optimal non permeable core which causes reduction of seepage through dam body and also being stable is necessary. The objective of this research is to optimize the geometry of earth dam clay core such ...
متن کاملSolving the Unconstrained Optimization Problems Using the Combination of Nonmonotone Trust Region Algorithm and Filter Technique
In this paper, we propose a new nonmonotone adaptive trust region method for solving unconstrained optimization problems that is equipped with the filter technique. In the proposed method, the various nonmonotone technique is used. Using this technique, the algorithm can advantage from nonmonotone properties and it can increase the rate of solving the problems. Also, the filter that is used in...
متن کامل